Massively-Parallel Similarity Join, Edge-Isoperimetry, and Distance Correlations on the Hypercube

نویسندگان

  • Paul Beame
  • Cyrus Rashtchian
چکیده

We study distributed protocols for finding all pairs of similar vectors in a large dataset. Our results pertain to a variety of discrete metrics, and we give concrete instantiations for Hamming distance. In particular, we give improved upper bounds on the overhead required for similarity defined by Hamming distance r > 1 and prove a lower bound showing qualitative optimality of the overhead required for similarity over any Hamming distance r. Our main conceptual contribution is a connection between similarity search algorithms and certain graph-theoretic quantities. For our upper bounds, we exhibit a general method for designing one-round protocols using edge-isoperimetric shapes in similarity graphs. For our lower bounds, we define a new combinatorial optimization problem, which can be stated in purely graph-theoretic terms yet also captures the core of the analysis in previous theoretical work on distributed similarity joins. As one of our main technical results, we prove new bounds on distance correlations in subsets of the Hamming cube.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

On a New Multicomputer Interconnection Topology for Massively Parallel Systems

This paper introduces a new interconnection network topology called Balanced Varietal Hypercube (BVH), suitable for massively parallel systems. The proposed topology being a hybrid structure retains almost all the attractive properties of Balanced Hypercube and Varietal Hypercube. The topology, various parameters, routing and broadcasting of Balanced Varietal Hypercube are presented. The perfor...

متن کامل

Data Communication and Parallel Computing on Twisted Hypercubes

Massively parallel distributed-memory architectures are receiving increasing attention to meet the increasing demand on processing power. Many topologies have been proposed for interconnecting the processors of distributed computing systems. The hypercube topology has drawn considerable attention due to many of its attractive properties. The appealing properties of the hypercube topology such a...

متن کامل

Heads-Join: Efficient Earth Mover's Distance Similarity Joins on Hadoop

The Earth Mover’s Distance (EMD) similarity join has a number of important applications such as near duplicate image retrieval and distributed based pattern analysis. However, the computational cost of EMD is super cubic and consequently the EMD similarity join operation is prohibitive for datasets of even medium size. We propose to employ the Hadoop platform to speed up the operation. Simply p...

متن کامل

Deadlock - Free Routing in a Faulty Hypercube

In massively parallel computers, processors are often connected in a hypercube configuration. Each vertex in the hypercube represents a processor, and each edge represents a communication link. One problem in such a system is avoiding deadlock, a state where there is a cycle of processors, each waiting on the next indefinitely. A second problem is that in a system with many processors, some are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017